perm filename A[VIS,HPM] blob sn#139776 filedate 1975-01-13 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002		Gad! I  clobbered  your previous  mail by  trying  to be  too
C00009 ENDMK
CāŠ—;
	Gad! I  clobbered  your previous  mail by  trying  to be  too
clever, to  the point where recovery is  impossible. You had one note
from BPM  saying  that  there were  no  more paper  versions  of  the
document,  but that  microfiche versions  would soon  be around,  and
could be gotten from Patte. Fortunately there was nothing else. 

	The interconnection problem is solved better than it was ever
solved before!   The solution  is actually in  the 1968 SJCC  Batcher
paper.  His merge sorting  net is more expensive than the permutation
net in the Waksman paper, but ever so much easier to use.  The answer
to  how to  make a  communications  net where  the  requests are  not
necessarily  a  perfect  permutation  is  towards  the  end,  in  the
paragraph labelled,  appropriately  enough, "Switching  network  with
conflict resolution". For n processors, his method involves a sorting
network  for n items connected to a  merging network which merges the
sorted requests  with n  additional dummy requests,  exactly one  for
each processor. The real  messages and the dummy requests are shipped
out through this, leaving a trace of their path. They come out of the
merger with all the  messages to a given processor  clumped together,
in  order of  a  priority number,  (which constitutes  the  low order
address bits, as far as  the switches are concerned), with the  dummy
request  for each  processor heading  each clump  (because the  dummy
requests  are given the highest possible  priority number). A circuit
now looks at  adjacent pairs of messages  in this list and  exchanges
the dummy requests with the message immediately following it, if this
message is for the same address as the dummy (if it exists at all, it
is the highest priority real request for  that destination).  The net
is then  run backwards, with messages taking  the same path backwards
as they took  forwards. The successful transfers  come out where  the
dummy  requests went  in, the  machines which  issued the  successful
transfers  getting the dummy (thus being  informed of their success),
whereas unsuccessful  requests are  simply  returned to  sender.   If
nobody wanted  to talk to a  particular processor, it  is consoled by
receiving a dummy message. 

	This is quite nice, but is has the problem that the net can't
be pipelined, since the net has to freeze in  a given state until the
messages  filter back. This  can be gotten  around by  sticking a few
extra bits  into the  message, indicating  routing  decisions, as  it
filters through the  net, and building a second sort  of mirror image
of  the net after the  two way interchangers,  which uses these extra
bits to guide the messages to the right place,  allowing the messages
to be put end to end. 

	The multiple  access memory construction in  the next section
of the paper looks  perfect for a  bubble memory organization  (after
all, if the bubbles  are going to be shifted around  and around, they
might   as  well  do   something  useful   as  they  move)   for  the
multiprocessor, allowing simultaneous access by all the processors to
the entire mass storage.  This is for later, of course. 

	It occurred to  me that Knuth might have  something about all
this in Vol 3 (Sorting and Searching), and, sure enough,section 5.3.4
is called Networks  for Sorting, and  talks about Batcher's nets  and
others. The conclusion there  is that slightly cheaper solutions than
his are possible, but construction  of these is complicated, and  the
result is messy. I'm currently reading  some of VanVoohris' (referred
to  by Knuth) papers on  nets with fewer elements,  but it looks like
bitonic sorters really  are the best  bet for actual  implementation,
mainly because of the fact that  the delay time for all paths through
them is the same.